Sentence-Based Active Learning Strategies for Information Extraction
نویسندگان
چکیده
Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction.
منابع مشابه
An Efficient Active Learning Framework for New Relation Types
Relation extraction is a fundamental task in information extraction. Different methods have been studied for building a relation extraction system. Supervised training of models for this task has yielded good performance, but at substantial cost for the annotation of large training corpora (About 40K same-sentence entity pairs). Semi-supervised methods can only require a seed set, but the perfo...
متن کاملActive Learning Strategies for Support Vector Machines, Application to Temporal Relation Classification
Temporal relations between events is a valuable source of information which can be used in a large number of natural language processing applications such as question answering, summarization, and information extraction. Supervised temporal relation classification requires large corpora which are difficult, time consuming, and expensive to produce. Active learning strategies are well-suited to ...
متن کاملActive Learning Selection Strategies for Information Extraction
The need for labeled documents is a key bottleneck in adaptive information extraction. One way to solve this problem is through active learning algorithms that require users to label only the most informative documents. We investigate several document selection strategies that are particularly relevant to information extraction. We show that some strategies are biased toward recall, while other...
متن کاملAggregating Inter-Sentence Information to Enhance Relation Extraction
Previous work for relation extraction from free text is mainly based on intra-sentence information. As relations might be mentioned across sentences, inter-sentence information can be leveraged to improve distantly supervised relation extraction. To effectively exploit inter-sentence information , we propose a ranking-based approach, which first learns a scoring function based on a listwise lea...
متن کاملActive Learning with Multiple Annotations for Comparable Data Classification Task
Supervised learning algorithms for identifying comparable sentence pairs from a dominantly non-parallel corpora require resources for computing feature functions as well as training the classifier. In this paper we propose active learning techniques for addressing the problem of building comparable data for low-resource languages. In particular we propose strategies to elicit two kinds of annot...
متن کامل